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METHOD FOR VIDEO FORMAT DETECTION 



BACKGROUND 



[0001] The invention relates generally to the field of data processing. More specifically, the 
invention relates to a method for detecting the format of video data and processing the video data 
based on the detected format. 

[0002] Various formats exist for television data. Much of the National Television System 
Committee (NTSC) formatted content broadcast on terrestrial or cable television is entirely or 
partially shot on fikn (movies, sitcoms, music videos, commercials, etc.) and then later converted 
to NTSC video format by 3:2 pulldown. 3:2 pulldown is the process by which 24 frames/sec 
film content is converted into 59.94 fields/sec video. As used herein, "3:2 content," "3:2 
pulldown clips," "3:2 sequences," and "telecine sequences" indicate video data that was 
generated firom film using the 3:2 pulldown process. "Standard video" will be used to indicate 
video data that did not originate from film. 

[0003] When converting from film to video using the 3:2 pulldown process, two film frames, 
102 and 104, generate five interlaced video fields, 106, 108, 1 10, 1 12 and 1 14, as depicted in 
Fig. 1. In addition, the film speed is slowed down by 0.1% to 23.976 (24/1.001) in order to 
account for the fact that NTSC video runs at 29.97 frames/sec. The process of converting from 
3:2 content back to film is called inverse 3:2 pulldown (also known as inverse telecine). 
[0004] The two primary applications of inverse 3:2 pulldown are display and compression. In 
terms of display, inverse 3:2 pulldown facilitates the optimal display of film content on a 
progressive monitor (such as a lM"ge-screen rear-projection system, a liquid crystal display, or a 
flat panel plasma display) because it allows each film frame to be displayed in its original 
progressive form for a uniform and consistent duration of 1/24* of a second. In terms of 
compression, inverse 3:2 pulldown results in better compression efficiency and reduced 
computational complexity in a video encoder because it allows telecine content to be encoded at 
24 frames/sec format rather than at 59.94 fields/sec. 

[0005] Known methods can detect 3 :2 content and extract the original film frames where the 
repeated field pattern is uninterrupted and distinct. However, there are several factors that 
produce unreliable results in known systems and methods for detecting 3:2 content or other video 
formats. 
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(0006] For example, known techniques do not reliably distinguish between repeated fields and 
non-repeated fields when the motion of objects is very small (providing little field-to-field 
positional difference) and/or the video noise is high. Moreover, it is not uncommon that Digital 
Video Disks (DVD's) and other video sources contain both standard video and 3:2 content In 
such cases, it is difficult for known algorithms to detect the location of standard video/3 :2 
content transitions and respond accordingly. In addition, with known techniques, standard video 
that is inverted can result in highly objectionable artifacts resulting from weaving two fields from 
different time instants. Moreover, the phase of the 3:2 pulldown pattern may change when two 
different 3:2 pulldown clips are spliced together, e.g., at a scene transition. Ambiguous 
situations arise, for example, when one 3:2 pulldown clip is transitioned to another 3:2 pulldown 
clip via a fade. This editing can result in the superposition of two 3:2 patterns with different 
phases, which caimot be unambiguously inverted using known methods. Furthermore, some 
content providers broadcast 60 fields/sec video that has been generated using a non-standard 3:2 
pulldown approach. One example of such a process is varispeed, which alters the run time of a 
program. Simply repeating film frames would cause stutter, so, instead, the 3:2 pattem is 
modified to ensure a smooth temporal rate. Known detection methods are ill-suited to detect 
such non-standard video formats. 

[0007] Therefore, a need exists for a system and method that can produce more reliable detection 
of video format where, for example, the source video is noisy, or where the video data pattem is 
interrupted by the use of splices or transitions, or where the video is otherwise altered by a 
content provider. 



SUMMARY OF THE INVENTION 



[0008] The invention provides methods and code for better detecting 3:2 content or other video 
formats. In one respect, embodiments of the invention improve the way in which fields of video 
data are compared to each other. In another respect, embodiments of the invention provide 
pattem matching techniques and code for processing the field difference data that results when 
video data is compared. In yet another respect, embodiments of the invention facilitate the 
formation of field pairs required by the inverse telecine process. 

|0009| The features and advantages of the invention will become apparent from the following 
drawings and detailed description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

[00101 Embodiments of the invention are described with reference to the following drawings, 
wherein: 

Fig. 1 is an illustration of 3:2 pulldown, according to the related art; 

Fig. 2 is a block diagram of a video format detection algorithm, according to one 

embodiment of the invention; 

Fig. 3 is a block diagram of a field difference vector formation algorithm, according 
to one embodiment of the invention; 

Fig. 4 is a block diagram of a field difference engine algorithm, according to one 
embodiment of the invention; 

Fig. 5 is a block diagram of a field-difference pattern matching algorithm, according 
to one embodiment of the invention; 

Fig. 6 is a block diagram of a correlation engine, according to one embodiment of the 
invention; 

Fig. 7 is a block diagram of film confidence calculation, according to one 
embodiment of the invention; 

Fig. 8 is a process flow diagram for a first stage of 3:2 content/standard video 
decision, according to one embodiment of the invention; 
Fig. 9 is an illustration of possible telecine phases and field pair formations, 
according to one embodiment of the invention; 

Fig. 10 is a block diagram of an algorithm for field pair formation, according to one 
embodiment of the invention; and 

Fig. 11 is an illustration of pseudo-code description of field-pair formation, according 
to one embodiment of the invention. 
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DETAILED DESCRIPTION 



[0011] This section describes stages of an inverse telecine process: field-difference vector 
formation, pattern matching, and two alternative stages for field-pair formation. A top-level 
block diagram of the overall process is shown in Fig. 2. 

[0012] Input fields (X„) are fields of a video data stream. Output field pairs (Ym) are fields of a 
video signal that have been paired for the purpose of producing film frames as part of an inverse 
telecine process. The process depicted in Fig. 2 advantageously discriminates between 3:2 
content and standard video. 

(0013J Field-difference vector formation 202 is used to identify repeated fields in the video data 
stream, such as film frame 0 (even field), film frame 2 (even field), and field frame 4 (even field) 
in Fig. 1. The output can be expressed as a vector d„. Pattern matching step 204 is a technique 
for comparing a sequence of field differences to known data formats. The output of the pattern 
matching step 204 is state variable, C„ , and a phase estimate, 6„ . C„ is used in conditional step 

206 to determine whether the video data is 3:2 content. 0^ describes the phase of 3:2 content. 

[0014] If it is determmed in conditional step 206 that the input is 3:2 content, then the process 
advances to field pair formation step 208 to pair output fields related to frames of film. For 
example, with reference to Fig. 1, video fields 1 12 and 1 14 might be paired. In addition, field 
pair formation step 208 sets a progressive frame flag and a repeat first field flag. The 
progressive frame flag indicates a pair of output fields to be used in generating a frame of film. 
The repeat first field flag indicates a field pair having a repeated first field. For instance, with 
reference to Fig. 1, the repeat first field flag would be set for the field pair consisting of fields 
106 and 108 to indicate that there was a repeated first field 1 10 in the 3:2 content. 
[0015] If it is determined in conditional step 206 that the input is standard video, then the 
process advances to step 210 for the output of video field pairs. In this instance, the progressive 
frame flag and the repeat first field flag are set to zero, since they are only applicable to 3:2 
content. 

[0016] Each of the steps depicted in Fig. 2 and described below may be used separately, or 
combined with the other steps described herein. Process steps may also be combined with other 
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fonnat detection algorithms. Moreover, the techniques described used herein are applicable to 
processing operations other than inverse telecine processing. 

10017] The sub-headings used below are for organizational convenience, and are not meant to 
limit the disclosure of any particular feature to any particular portion of this specification. 

Field-Difference Vector Formation (Step 202) 

10018] This section describes the formation of a field-difference vector sequence, {dj,}, which is 
formed from an input field sequence {X,i}. The purpose of the field difference sequence {d„} is 
to identify repeated fields in 3:2 content. As shown in Fig. 3, the field difference sequence{d„} 
may be obtained by executing field difference step 302, vector formation step 304, clamping step 
306 and normalization step 308 in series. A more detailed description of each is provided below. 

Field Difference (Step 302) 

I0019J The field difference step 302 operates on two input fields, X„ and Xn.2, and outputs a 
field-difference metric, D„, n-2- 

[0020] The difference metric D„, „.2 is a single scalar value that indicates the "closeness" of the 
two fields. The smaller the value, the more likely the two input fields are duplicates. A simple 
sum of absolute pixel-by-pixel differences between two input fields may not suffice as a 
mechanism to distinguish repeated and non-repeated fields because it cannot sufficiently 
discriminate slight motion (indicating a non-repeated field) from noise. An improved field 
difference engine, according to one embodiment of the invention shown in Fig. 4, addresses this 
concern. 

[0021 J First, in optional pre-processing steps 402 and 404, the input fields are cropped at the left, 
right, top and bottom to avoid edge artifacts. To mitigate the effects of slight motion and noise, 
we filter and sub-sample each input field by a factor of L in the horizontal and vertical 
dimension, where L is a power of 2. See steps 406 and 408. This is implemented by treating the 
input field as a texture and creating an L-level mipmap with a simple two-tap averaging filter 
(although more sophisticated filtering is also possible). The final mipmap level is used to 
compute the field-difference. Filtering and sub-sampling in steps 406 and 408 also reduce the 
number of Central Processing Unit (CPU) cycles required to compute the field difference. 
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I0022J Next, the image is segmented into pixel groups (step not shown), and pixel-by-pixel 
differences are determined between fields X„ and X„.2 in step 410. In one embodiment, the pixel 
groups are blocks having W x H pixels, for example 32 x 16 pixels. 
(0023] In step 412, the sum of absolute pixel differences (SAD) between the two fields is 
computed for each group of pixels. For sequences with small amounts of motion, only a few 
groups of pixels may exhibit large field-to-field differences. Consequently, in one embodiment, 
the pixel group scores are selected in step 414, and only the scores of the top ranked groups of 
pixels, e.g., the top 10%, are combined and used to compute the field difference metric, D„, n-2, in 
step 416. 

[0024] Optimum pixel group size and pixel group selection rules can be determined empirically. 
By selecting and using a small subset of the pixel groups, the algorithm can more easily 
distinguish slight motion from noise. In another embodiment, each of the pixel group scores 
may be compared to a predetermined threshold value to sort the pixel group scores. In yet 
another alternative embodiment, all pixel group scores are used. 

[0025] Thus, by calculating field difference using blocks or other pixel groups, field differences 
can be detected, even where, for example, most of the background is unchanged. 

Field Difference Vector Processing (Steps 302, 304, 306, and 308) 

[0026] Since the spacing between repeated fields for 3:2 telecine content is 2, the expectation is 
that (relatively speaking), D„ „_2^U small for repeated fields and large otherwise. 

Accordingly, we construct a repeat-field-difference sequence, d„ , where d„ = D„ „_2 • 
(0027] As an intermediate step 304, we form a corresponding vector sequence of length M 
difference vectors, given hydl=[d„, d^_^ d„_^^^i J . Next we construct the vector 
sequence d'„ in step 306 by clamping the outlier field differences in the vector . More 
specifically, define of* as the iV* largest element of d" . Then, we construct d'„ according to: 



[0029] Finally, in step 308, we scale these vectors to zero-mean and unit variance, resulting in 
the normalized vector sequence, d„ , which is computed according to: 



[0028] 
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10030) d„= 

[00311 In vector notation, the mean and standard deviation of a vector, v , are defined as 
//^ =l^v and al = (v -1 • J'^(v -1//^), respectively, wherel = [l,l,- •, if . 

Pattern Matching (Step 204) 

10032] The goal of the pattern matching step 204 is to make an accurate 3:2 content/standard 
video decision, C„ , and a phase estimate, 9„ , which describes a position within a known pattern. 

In one embodiment of step 204, basis vectors are used to describe known patterns of 3:2 content 
and standard video formats. 

[0033] Many existing inverse 3 :2 techniques limit their observations to one or two field 
differences when detecting 3:2 content. Often, the repeat field decision is based on a simple 
threshold of the field difference, which is very susceptible to noise. For this type of approach, it 
is difficult to strike a balance between responding quickly to 3:2 content/standard video 
transitions and inadvertently responding when no transition really exists due to noisy 
measurements. As a result, clumsy heuristics are often required to prevent an algorithm from 
mcorrectly switching back and forth between standard video and 3:2 content. 
[0034] In contrast with existing techniques, embodiments of pattern matching step 204 observe a 
large history of field differences over a given window of the repeat-field-difference 
sequence, {d„]. Considering a large number of field differences simultaneously mitigates the 

effects of noise and affords a larger context for the algorithm to operate. In addition, by 
introducing a small delay, the algorithm can look into the past as well as the future when making 
the 3:2 content/standard video decision and determining the 3:2 phase. Moreover, by 
considering all possible translations of the window within the sequence of field differences, the 
algorithm can often pin point the location of 3:2 content/3 :2 content or standard video/3 :2 
content transitions more precisely. 

[0035] The embodiment of step 204 illustrated in Fig. 5 includes a correlation step 502, a step 
for handling splice points 504, a first decision step 506, and a second decision step 508, all 
coupled in series. Steps 502, 504, 506 and 508 may be used separately or in combination , and 
are described further below. 
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Overview of Correlation (Step 502) 

[00361 In an embodiment of correlation step 502 illustrated in Fig. 6, the normalized field 
difference sequence, d„ , is fed into a bank of K correlators for film 602 and video 604, resulting 

in K output sequences given by {R^^,R^^,...,Rf,_^^\. Mathematically, we compute the 

output, i?jt„ , of the field difference sequence with correlator ^ by taking an inner product of 

vector d„ with the basis vector, , according to 

100371 

[0038J Recall the Cauchy-Schwarz Inequality, which states (x, y ) < |xj| • (|y |( . By normalizing both 
the field difference and basis vectors, i.e., ||d„ || = ||b^ |) = 1 .0 , we can constrain the correlation 
output range such that i?t „ ^ 1.0 . The goal of normalization is to keep „ invariant to changes 
in the relative quantity and size of motion in the input video fields. 

Basis Vector Construction 

[0039] The K basis vectors, {b b b j^,, } , correspond to each of the correlators 602 and 604 

and are constructed to be of zero-mean and unit standard deviation. In one embodiment, the 
basis vector length, M, is assumed to be a multiple of 5, corresponding to the field difference 
period of telecine content. 

[0040] The first five of these basis vectors, {bp ,b, , . . . } , represent the "idealized" repeat field 
patterns for the five possible phases of 3:2 content, and the remaining K-5 vectors, 
{bjjbj ,. . .,bj^._, }, represent the repeat field patterns for standard video. 

[0041] In both cases, the normalized basis vectors are derived firom the vectors, {bg , b', , . . . , b'j^ _, } , 
whose elements are restricted to the values 0.0 or 1,0. The normalized basis 
vectors, {b o , b b }, are computed by scaling the vectors, {b'o , b b'j^,, }, to zero mean 
and imit variance, according to: 
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100421 b,= * 



[0043) For 3 :2 content, the five un-normalized basis vectors (bp , b^ , . . . , b'4 } are constructed by 

considering the five possible phases of telecine content and assigning 0.0 to the locations for 
each repeat field, and 1.0 otherwise. For example, for M=10, the un-normalized vectors 
{b;,b;,...,b;} are given by: 

b; = [0,1,1,1,1,0,1,1,1,17 

b; = [1,0,1,1,1,1,0,1,1,1]' 

b; = [1,1,0,1,1,1,1,0,1,17 

b'3 = [1,1,1,0,1,1,1,1,0,17 

b; = [1,1,1,1,0,1,1,1,1,07 

100441 For standard video, the un-normalized video basis vectors, {b, ,bg,. . .,b'^_, } , are 

constructed by considering a subset (or potentially all) of the remaining, non-zero combinations 
of 0.0 and 1.0. For example, if M = 10, we have K = - 1 = 1023 possible non-zero 
combinations of 0.0 and 1 .0, resulting in un-normalized vectors given by (bj , bg , • • • , b',022 } • 



Basis Vector Correlation (Step 502) 



100451 To determine whether the input fields are 3:2 content or standard video, we generate 3:2 
content and standard video confidence metrics by selecting the maximum outputs from the 
respective banks of film and video correlators according to: 
100461 i?;,„„=m^{i?,„}, 

and 

100471 i?:„,o.„ = max{/?,„}. 

10048] These values are simply the correlation outputs for the best 3:2 content and standard 
video basis vector matches, and in a sense, can be viewed as likelihood measures for 3:2 content 
and standard video. We similarly record the index for the best 3:2 content basis vector 
(corresponding to R'fl,„ „) according to: 

100491 ^;=argmax{i?,„}. 
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which can be viewed as a first-pass estimate of the 3:2 phase for 3:2 content. 
[00501 Accordingly, by expanding the window of consideration, and by comparing the field 
difference sequence to 3:2 content basis vectors and/or standard video basis vectors, a more 
reliable prediction can be made for identifying the format of the source video data. 

Handling Splice Points (Step 504) 

[OOSi] A good match with a 3:2 content basis vector requires an iminterrupted history of telecine 
content over the duration of the field-difference vector window. Edit or splice points between 
two telecine sequences may violate this requirement if the 3:2 content/3 :2 content transition 
disrupts the normal 3:2 pattern. Across such a splice point, a good match with one of the 3:2 
content basis vectors is not possible since the transition causes a discontinuity in the expected 
repeat-field difference pattern. 

[0052] To remedy the problem of detecting 3:2 across a splice point, according to one 
embodiment of the step for handling splice points 504 detailed in Fig. 7, we calculate an 
improved 3:2 content confidence metric, Rfi,„ „ , by finding the best basis vector match over a 

sliding window of length M according to: 

[0054] This approach guarantees at least one good basis vector match if splice points are 
separated by at least M fields. The index corresponding to the best translation given by: 
[0055] m„ = arg max{R'j,„„^„ ), 

can be used to improve the 3:2 phase estimate by using the original phase estimate corresponding 
to Rfi,„ „ and adjusting it by the delay, m*„ , according to: 

[0056[ d„ = mod(^^^^. +M-m'„ ,5). 

[0057] Note that the search over a sliding window of size M introduces a delay of M fields to the 
overall inverse 3:2 processing. 

[0058] For standard video, there is no need to search over all possible sliding windows, so 
instead we define Ryi^o „ by selecting the best basis vector match using a window centered at 

time n, i.e.. 
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10059] R^ij^o,n = Kideo,n+M/2 • 

|0060| Thus, pattern matching against a set of possible splice vectors significantly improves the 
ability of algorithms to detect splice edits in the source video. As illustrated above, the phase 
index can also be determined with this technique. 



1" and 2^ Decision (Steps 506 and 508) 



[0061] To determine whether the input fields belong to film or video content, the inverse 3:2 
algorithm applies a variety of heuristics to the relative and absolute magnitudes of the phase 
confidences, Rj;,„^„ and > consistency of the phase estimates, 0„ , the presence of 

dropped frames, and past mode decisions. In the embodiment illustrated in Fig. 5, these inputs 
are fed into a two-pass algorithm whose output is a 3:2 content/standard video decision, 
delineated by C„ , that takes on one of three values: 3:2 content (FILM); 3:2 content in transition 

(FILM IN TRANSITION); and, standard video (VIDEO). The output C„ , along with the film 

phase estimate 0„ , determine how the input fields are processed into output fields or fi-ames (as 

described below). 



r' Decision (Step 506) 



[0062] The 1 decision step 506 uses phase confidences, Rfii„ „ and i?vi«feo,n » ^ make an initial 3 :2 
content/standard video determination delineated by . A flow chart describing the logic for this 
stage, according to one embodiment of step 506 is shown in Fig. 8. 

[0063] As shown therein, the process begins in step 802 by processing the first input field in 
temporal order and initializing a counter, JilmCnt, to zero. The counter filmCnt increments by 
one in step 8 1 0 for each successive field of 3:2 pulldown encountered and is reset to zero in step 
818 whenever a field is determined to be standard video. 

10064] For each input field indexed by n, the 3:2 content and standard video phase confidence, 
^fiim,n Kitko.n » Tcspectively, are computed in step 804. If a 3 :2 pattern has been observed for 
more than 5 fields in step 806, then the film confidence, Rjam.n » is simply compared against a 
threshold in step 808 to determine whether the latest field is still part of a 3:2 pulldown sequence. 
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or whether the content has reverted to standard video. If a 3:2 pattern has been observed for less 
than 5 fields in step 806, then the algorithm requires more stringent requirements to be met in 
steps 814 and 816. First, the film confidence, 7?^,,^ „ , is compared against a larger threshold in 

step 814. If the film confidence is larger than this threshold, the film confidence is then 
compared against the standard video confidence, R^ij^^.n » step 816 to determine whether the 

latest field is still 3:2 content or has reverted to standard video. After each field is processed, the 
field counter is incremented in step 812 and the next field in temporal order is processed. 

2nd Decision (Step 508) 

[0065] To better handle 3:2 content/3 :2 content and 3:2 content/standard video transitions, one 
embodiment of the invention uses an additional five field delay in the 2"^ decision step 508 so 
that we can consider future as well as past film phase estimates, dropped frame discontinuities 
and 3:2 content/standard video decisions, , fi-om the 1*' decision step 506 described above. 

The five field delay corresponds to the repeated field period of 3:2 content and adds to the M 
field delay already introduced with reference to handling splice points above. The output of the 
2"*" decision step 508 is given by C„ . 

(00661 Before declaring that the current field belongs to 3:2 content, the 2"'' decision step 508 
checks the "consistency" of the phase estimates and past 3:2 content/standard video decisions 
over a sliding window that includes past and future fields. Specifically, we defme as window 
of field indices given by: 
[00671 r„={/:4,„ 

where 7^,^^ „ is the index of the most recently observed repeated field and /^„^ „ = 5 . In other 
words, the window starts at the last previously detected repeat field (when^„= 0) and ends five 
fields in the future. For field n, the 3:2 phase estimate, 6„ , is said to have incremented 
consistently if mod(^„ + 5 - ^„_,, 5) = 0 , which essentially means that the 3:2 phase estimates are 
incrementing in the manner consistent with 3:2 content. If 1) no dropped frames are detected, 2) 
all the phases are consistent over the window, W„ , and 3) C'„ =FILM for all fields in W„ , then C„ 
is determined to be 3:2 content. If a single phase inconsistency or a single dropped frame is 
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detected, the decision is 3:2 content in transition, and otherwise the decision is standard video. 
Accordingly, any one or more of pattern matching steps 502, 504, 506, and 508 may be used to 
determine in step 206 whether the input is 3 :2 content. 3:2 content and 3:2 content in transition 
are promoted to field pair formation step 208. Standard video is promoted to field pair formation 
step 210. 

Field-Pair Formation for 3:2 Content (Step 208) 

[0068] 3:2 content and 3:2 content in transition are processed in step 208. Step 208 uses the 
mode decision, C„ , and phase estimate, 0„ , from the output of the pattern matching step 204 to 

determine the pairing of the output fields and to set the repeat Jirst Jield and progressive Jrame 
flags. 

[0069] As we mentioned earlier, diis process is rather straightforward if the 3:2 pulldown pattern 
is uninterrupted. However, for splice points, the proper pairing and flag selection is more 
challenging. A feature of this stage of the processing is that all possible splice points are 
considered. In one embodiment, step 208 is executed in accordance with the table illustrated in 
Fig. 9. 

[0070] As shown therein, column 902 shows input field sequences {X«} for each often possible 
phase sequences { 6„ }shown in column 904. Except for the first row, all listed input field 

sequences {X«} in column 902 represent possible splice points between two 3:2 pulldown clips. 
For each possible input field sequence {Xm} shown in colunrn 902, an optimal series of output 
field pairs Ym is provided in colimm 906, together with appropriate states for the progressive 
frame flag and the repeat first field flag. 

[0071] One embodiment of field pair formation step 208 identifies an input field sequence {Xn} 
as one of the possible input field sequences {X„} listed in column 902 in order to form an 
appropriate field pair sequence Ym ,and in order to set the progressive frame flags and the repeat 
first field flags according to the corresponding solution provided in column 906. 
[0072] In the alternative, or in combination, an embodiment of field pair formation step 208 
identifies an input field sequence {X„} having a phase sequence {9^} illustrated by one of the 
cases shown in colunm 904 in order to form an appropriate field pair sequence Y^ and the 
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progressive frame flags and the repeat first field flags according to the corresponding solution 
provided in column 906. 

Field-Pair Formation for Standard Video (Step 210) 

10073] Standard video is processed in step 210. In one embodiment, step 21 0 is executed in 
accordance with the process depicted in Fig. 10. As shown therein, a first field X„ and a second 
field X„.i are offset to generate Xin+i and X2„ , respectively. Thus, for input data of 
Xq , X, , , X3 , etc., the output field pairs would be = (X^ , X, ) , Y, = (Xj , X3 ) , etc. In the 

case of standard video, the progressive fi^e flag and the repeat first field flag are not applicable 
and are set to zero. 

Steps 206. 208, and 210 

[0074| Any and all of the methods and algorithms described above can be implemented via 
software. The software can be programmed in a variety of languages, and stored in a variety of 
formats, for processing by a video preprocessor or other processor capable of executing the 
software. 

[0075] Fig. 1 1 illustrates pseudo code to implement one embodiment of conditional step 206 and 
field pair formation steps 208 and 210. As shown therein, C„and ^„ are read. If C„ indicates 3:2 

content (FILM) or 3:2 content in transition (FILM IN TRANSITION), then e„ is used to output 

field pairs and set the progressive frame flag and the repeat first field fiag for each field pair. 
Otherwise, standard video frame are paired for output. 

Summary 

[0076] The invention described above thus overcomes the disadvantages of known methods by 
providing improved techniques for determining field differences, and for identifying progressive 
firames and repeated fields in video format, especially in cases where the source data is noisy, or 
where the data pattem is interrupted or otherwise altered. The detection of video format is 
advantageously improved compared to known methods. 
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(0077) While this invention has been described in various explanatory embodiments, other 
embodiments and variations can be effected by a person of ordinary skill in the art without 
departing from the scope of the invention. In particular, many of the features disclosed herein 
can be used in combination with other methods related to the detection of 3:2 pulldown or other 
data formats. 
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